28/10, 2021

Reproducibility and metadate

What is reproducible research?

  • Same data, same code same results

What is reproducible research?

  • Code, data (Raw data) and text intertwined
  • in R: Rmarkdown (Rmd)
  • in Python Jupiter Notebooks

When it is not reproducible

Reproducibility in R

  1. One folder
    • Raw data (csv, xls, html, json, images, pdf)
    • Code and text (Rmd, shiny app, md, .r)
    • Results (Manuscript, Webpage, WebApp)

Some examples

Why do reproducible research?

Reproducible research benefits those who do it

  • You can do your research again
  • You can easily re-analyze when you have new input
  • You can easily share it, less time teaching how to do it
  • More citations, McKiernan et al. 2016

Github

Github

  • Similar to “Google Drive” or “Dropbox” for code
  • Version control (we can come back to any prior version)
  • Either code based or GUIs
  • Each project a repo
  • Workshop next hour (Make sure you make your account for better use)

limitations of github

  • Not great for big data
    • Limit of one file 100 Mb
    • Límite of one repo 1 Gb
  • We can override that with DVC (Data Version Control)

Version control

Metadata